Lexical Resources for Hindi Marathi MT

نویسندگان

  • Sreelekha S
  • Pushpak Bhattacharyya
چکیده

In this paper we describe some ways to utilize various lexical resources to improve the quality of statistical machine translation system. We have augmented the training corpus with various lexical resources such as IndoWordnet semantic relation set, function words, kridanta pairs and verb phrases etc. Our research on the usage of lexical resources mainly focused on two ways such as augmenting parallel corpus with more vocabulary and augmenting with various word forms. We have described case studies, evaluations and detailed error analysis for both Marathi to Hindi and Hindi to Marathi machine translation systems. From the evaluations we observed that, there is an incremental growth in the quality of machine translation as the usage of various lexical resources increases. Moreover usage of various lexical resources helps to improve the coverage and quality of machine translation where limited parallel corpus is available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bilingual Words and Phrase Mappings for Marathi and Hindi SMT

Lack of proper linguistic resources is the major challenges faced by the Machine Translation system developments when dealing with the resource poor languages. In this paper, we describe effective ways to utilize the lexical resources to improve the quality of statistical machine translation. Our research on the usage of lexical resources mainly focused on two ways, such as; augmenting the para...

متن کامل

Machine Translation, Language Divergence and Lexical Resources

The key concern in machine translation, whose purpose it is to convert documents from one language to another, is the language divergence problem. This problem arises from the fact that languages make different lexical and syntactic choices for expressing an idea. Language divergence needs to be tackled not only for translating between language pairs from distant families (e.g, English and Japa...

متن کامل

Comparison of SMT and RBMT, The Requirement of Hybridization for Marathi – Hindi MT

We present in this paper our work on comparison between Statistical Machine Translation (SMT) and Rule-based machine translation for translation from Marathi to Hindi. Rule Based systems although robust take lots of time to build. On the other hand statistical machine translation systems are easier to create, maintain and improve upon. We describe the development of a basic Marathi-Hindi SMT sy...

متن کامل

Comparison of SMT and RBMT; The Requirement of Hybridization for Marathi-Hindi MT

We present in this paper our work on comparison between Statistical Machine Translation (SMT) and Rule-based machine translation for translation from Marathi to Hindi. Rule Based systems although robust take lots of time to build. On the other hand statistical machine translation systems are easier to create, maintain and improve upon. We describe the development of a basic Marathi-Hindi SMT sy...

متن کامل

That'll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models

Parallel corpora are often injected with bilingual lexical resources for improved Indian language machine translation (MT). In absence of such lexical resources, multilingual topic models have been used to create coarse lexical resources in the past, using a Cartesian product approach. Our results show that for morphologically rich languages like Hindi, the Cartesian product approach is detrime...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1703.01485  شماره 

صفحات  -

تاریخ انتشار 2014